Welcome to week 1 of spatial analysis for public health. This week we will be learning about the process of moving from visualizing spatial data through to exploration and analysis. We will get our hands dirty with some R code and learn how to make beautiful maps. This week’s lecture will focus on some of the concepts behind spatial epidemiology. The code below covers loading and visualizing spatial data in R. A more complete walkthrough of this code can be seen in this video. You will then have a chance to apply that code to new data and questions in this week’s assignment.
The simplest data is a table with coordinates (i.e. point data). For this assignment, we’ll work with malaria prevalence point data from Burkina Faso.
First get the necessary libraries for this week
{% highlight r %} library(sp) library(raster) library(rgdal) library(leaflet) {% endhighlight %}
Import the data
BF_malaria_data <- read.csv("https://raw.githubusercontent.com/HughSt/HughSt.github.io/master/course_materials/week1/Lab_files/Data/BF_malaria_data.csv",
header=T)
The columns should be self-explanatory, but briefly: - examined = numbers tested - positives = of those tested, how many were positive for malaria - longitude = longitude in decimal degrees - latitude = latitude in decimal degrees
head(BF_malaria_data) # gives you the first few rows
## X examined positives longitude latitude
## 1 1 998 153 -5.45000 10.38333
## 2 2 459 262 -5.35000 10.58333
## 3 3 595 72 -5.21667 10.81667
## 4 4 883 81 -5.10000 10.75000
## 5 5 456 170 -5.06270 11.56180
## 6 6 304 264 -5.05000 10.18333
If you want to create a new variable, you can use the $ sign. For example, to create a new column showing infection prevalence (numbers positive / numbers examined)
BF_malaria_data$infection_prevalence <- BF_malaria_data$positives / BF_malaria_data$examined
#Create a histogram of the prevalence
hist(BF_malaria_data$infection_prevalence, breaks=20)
It is possible to use R’s base graphics to plot points, treating them like any other data with x and y coordinates. For example, to get a plot of the points alone
plot(BF_malaria_data$longitude, BF_malaria_data$latitude,
ylab = "Latitude", xlab="Longitude") #boring!
You might want to vary the size of the circle as a function of a variable. For example, if we wanted to plot points with size relative to prevalence we can use the expansion argument cex
# Use the cex function to plot circle size as a function of a variable
plot(BF_malaria_data$longitude, BF_malaria_data$latitude,
cex=BF_malaria_data$infection_prevalence,
ylab = "Latitude", xlab="Longitude (decimal degrees)")
In R, it is sometimes useful to package spatial data up into a ‘Spatial’ class of object using the sp package. This often makes it easier to work with and is often a requirement for other functions. The sp package allows you to put your data into specific spatial objects, such as SpatialPoints or SpatialPolygons. In addition, if your data are more than just the geometry, i.e. if you have data associated with each spatial feature, you can create spatial DataFrames, i.e. SpatialPointsDataFrames and SpatialPolygonsDataFrames. For example, if we wanted to create a SpatalPointsDataFrame using the Burkina Faso data:
BF_malaria_data_SPDF <- SpatialPointsDataFrame(coords = BF_malaria_data[,c("longitude", "latitude")],
data = BF_malaria_data[,c("examined", "positives", "infection_prevalence")],
proj4string = CRS("+init=epsg:4326")) # sets the projection to WGS 1984 using lat/long. Optional but good to specify
# To get a summary of the object
BF_malaria_data_SPDF
## class : SpatialPointsDataFrame
## features : 109
## extent : -5.45, 1.85839, 9.95, 14.9722 (xmin, xmax, ymin, ymax)
## coord. ref. : +init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 3
## names : examined, positives, infection_prevalence
## min values : 56, 2, 0.0020746887966805
## max values : 998, 916, 0.975903614457831
# SPDFs partition data elements, e.g. the coordinates are stored separately from the data
head(BF_malaria_data_SPDF@coords)
## longitude latitude
## [1,] -5.45000 10.38333
## [2,] -5.35000 10.58333
## [3,] -5.21667 10.81667
## [4,] -5.10000 10.75000
## [5,] -5.06270 11.56180
## [6,] -5.05000 10.18333
head(BF_malaria_data_SPDF@data)
## examined positives infection_prevalence
## 1 998 153 0.15330661
## 2 459 262 0.57080610
## 3 595 72 0.12100840
## 4 883 81 0.09173273
## 5 456 170 0.37280702
## 6 304 264 0.86842105
# You can quickly access the data frame as per a standard data frame, e.g.
head(BF_malaria_data_SPDF$infection_prevalence)
## [1] 0.15330661 0.57080610 0.12100840 0.09173273 0.37280702 0.86842105
# You can use the plot or spplot function to get quick plots
plot(BF_malaria_data_SPDF)
spplot(BF_malaria_data_SPDF, zcol = "infection_prevalence")
Let’s have a look at SpatialPolygonsDataFrames. To load a polygon shapefile (or other file types), you can use the readOGR function from the rgdal package. For example, if you wanted to load in the province boundaries for Burkina Faso shapefile BF_Adm_1 from the BF_Adm_1_shapefile folder on GitHub, assuming you have downloaded the folder of files you would use the following command
BF_Adm_1 <- readOGR("BF_Adm_1_shapefile", "BF_Adm_1")
As it happens, admin boundary data is accessible using the getData function from the raster package. Be careful as some other packages also have a getData function, so to specify that you want to use the getData function from the raster package you can use the following code
# You first need the ISO3 codes for the country of interest. You can access these using `ccodes()`. For Burkina Faso, the ISO3 is BFA.
# The getData function then allows you to retrieve the relevant admin level boundaries from GADM.
BF_Adm_1 <- raster::getData("GADM", country="BFA", level=1)
Now we can plot the point data in context
# Plot both country and data points
plot(BF_Adm_1)
points(BF_malaria_data$longitude, BF_malaria_data$latitude,
cex=BF_malaria_data$infection_prevalence,
ylab = "Latitude", xlab="Longitude",
pch=16)
Rather than just relying on R base graphics, we can easily create webmaps using the leaflet package. There are many basemaps available. See here. For any map, identify the Provider name, e.g. “OpenStreetMap.Mapnik”, by clicking on the map.
# Define your basemap
basemap <- leaflet() %>% addTiles()
basemap
# Or choose another basemap
basemap <- leaflet() %>% addProviderTiles("Esri.WorldImagery")
basemap
#Let's choose a simple one
basemap <- leaflet() %>% addProviderTiles("CartoDB.Positron")
You can use the ‘piping’ command %>% to add layers. As our point and polygon data are already ‘Spatial’ object this is easy
basemap %>% addPolygons(data=BF_Adm_1)
# to change the colors/line weight
basemap %>% addPolygons(data=BF_Adm_1, color = "red",
weight = 1, fillOpacity = 0.2)
#You can also add popups
basemap %>% addPolygons(data=BF_Adm_1,
popup = BF_Adm_1$NAME_1)
# If you want to add points as well
basemap %>% addPolygons(data=BF_Adm_1, weight = 2,
popup = BF_Adm_1$NAME_1) %>%
addCircleMarkers(data=BF_malaria_data_SPDF,
color="red", radius = 2)